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[0040] The first column in Table 1 provides a unique "Clone ID NO:Z" for a cDNA 

clone related to each contig sequence disclosed in Table 1. This clone ID references the 
cDNA clone which contains at least the 5' most sequence of the assembled contig, and at 
least a portion of SEQ ID NO:X was determined by directly sequencing the referenced 
clone. The reference clone may have more sequence than described in the sequence listing 
or the clone may have less. In the vast majority of cases, however, the 1 clone is believed to 
encode a full-length polypeptide. In the case where a clone is not fulWength, a fiill-length 
cDNA can be obtained by methods known in the art and/or as described elsewhere herein. 
[0041] The second column in Table 1 provides a unique "Contig ID" identification 

for each contig sequence. The third column provides the "SEQ ID NO :X" identifier for 
each of the ovarian associated contig polynucleotide sequences disclosed in Table 1. The 
fourth column, "ORF (Prom-To)", provides the location (i.e., nucleotide position numbers) 
within the polynucleotide sequence "SEQ ID NO:X" that delineate the preferred open 
reading frame (ORF) shown in the sequence listing and referenced in Table 1, column 5, as 
SEQ ID NO:Y. Where the nucleotide position number "To" is lower than the nucleotide 
position number "From", the preferred ORF is the reverse complement of the referenced 
polynucleotide sequence. 

[0042] The fifth column in Table 1 provides the corresponding SEQ ID NO:Y for 

the polypeptide sequence encoded by the preferred ORF delineated in column 4. In one 
embodiment, the invention provides an amino acid sequence comprising, or alternatively 
consisting of, a polypeptide encoded by the portion of SEQ ID NO:X delineated by "ORF 
(From-To)". Also provided are polynucleotides encoding such amino acid sequences and 
the complementary strand thereto. 

[0043] Column 6 in Table 1 lists residues comprising epitopes contained in the 

polypeptides encoded by the preferred ORF (SEQ ID NO:Y), as predicted using the 
algorithm of Jameson and Wolf, (1988) Comp. Appl. Biosci. 4:181-186. The Jameson- 
Wolf antigenic analysis was performed using the computer program PROTEAN (V ersion 
3.11 for the Power Macintosh, DNASTAR, Inc., 1228 South Park Street Madison, WI). In 
specific embodiments, polypeptides of the invention comprise, or alternatively consist of, at 
least one, two, three, four, five or more of the predicted epitopes as described in Table 1. It 



1480 



WO 02/00677 



PCT/US01/18569 



will be appreciated that depending on the analytical criteria used to predict antigenic 
determinants, the exact address of the determinant may vary slightly. 
[0044] Column 7 in Table 1 provides an expression profile and library code: count 

for each of the contig sequences (SEQ ID NO:X) disclosed in Table 1, which can routinely 
be combined with the information provided in Table 4 and used to determine the normal or 
diseased tissues, cells, and/or cell line libraries which predominantly express the 
polynucleotides of the invention. The first number in column 7 (preceding the colon), 
represents the tissue/cell source identifier code corresponding to the code and description 
provided in Table 4. For those identifier codes in which the first two letters are not "AR", 
the second number in column 7 (following the colon) represents the number of times a 
sequence corresponding to the reference polynucleotide sequence was identified in the 
tissue/cell source. Those tissue/cell source identifier codes in which the first two letters are 
"AR" designate information generated using DNA array technology. Utilizing this 
technology, cDNAs were amplified by PGR and then transferred, in duplicate, onto the 
array. Gene expression was assayed through hybridization of first strand cDNA probes to 
the DNA array. cDNA probes were generated from total RNA extracted from a variety of 
different tissues and cell lines. Probe synthesis was performed in the presence of 33 P dCTP, 
using oligo(dT) to prime reverse transcription. After hybridization, high stringency washing 
conditions were employed to remove non-specific hybrids from the array. The remaining 
signal, emanating from each gene target, was measured using a Phosphorimager. Gene 
expression was reported as Phosphor Stimulating Luminescence (PSL) which reflects the 
level of phosphor signal generated from the probe hybridized to each of the gene targets 
represented on the array. A local background signal subtraction was performed before the 
total signal generated from each array was used to normalize gene expression between the 
different hybridizations. The value presented after "[array code]:" represents the mean of 
the duplicate values, following background subtraction and probe normalization. One of 
skill in the art could routinely use this information to identify normal and/or diseased 
tissue(s) which show a predominant expression pattern of the corresponding polynucleotide 
of the invention or to identify polynucleotides which show predominant and/or specific 
tissue and/or cell expression. The sequences disclosed herein have been determined to be 
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predominantly expressed in ovarian tissues, including normal and diseased ovarian tissues 
(See Table 1 , column 7 and Table 4). j 
[0045] Column 8 in Table 1 provides a chromosomal map location for certain 

polynucleotides of the invention. Chromosomal location was determined by finding exact 
matches to EST and cDNA sequences contained in the NCBI (Rational Center for 
Biotechnology Information) UniGene database. Each sequence in the UniGene database is 
assigned to a "cluster"; all of the ESTs, cDNAs, and STSs in a cluster are believed to be 
derived from a single gene. Chromosomal mapping data is often available for one or more 
sequence(s) in a UniGene cluster; this data (if consistent) is then applied to the cluster as a 
whole: Thus; it is possible to infer the chromosomal location of a new polynucleotide 
sequence by determining its identity with a mapped UniGene cluster, i 
[0046] A modified version of the computer program BLASTN (Altshul et al., J. 

-M6i: Biol72T5r403=410-(1990)rand-Gish-et al^Nat. Genet: 3:266-272 (1993)) was used to 
search the UniGene database for EST or cDNA sequences that contain exact or near-exact 
matches to a polynucleotide sequence of the invention (the 'Query'). A sequence from the 
UniGene database (the 'Subject') was said to be an exact match if it contained a segment of 
50 nucleotides in length such that 48 of those nucleotides were in the same order as found 
in the Query sequence. If all of the matches that met this criteria were in the same UniGene 
cluster, and mapping data was available for this cluster, it is indicated in Table 1 under the 
heading "Cytologic Band". Where a cluster had been further localized to a distinct cytologic 
band, that band is disclosed; where no banding information was available, but the gene had 
been localized to a single chromosome, the chromosome is disclosed. 
[0047] Once a presumptive chromosomal location was determined for a 

polynucleotide of the invention, an associated disease locus was identified by comparison 
with a database of diseases which have been experimentally associated with genetic loci. 
The database used was the Morbid Map, derived from OMIM™ (supra). If the putative 
'chromosomal location of a polynucleotide of the- invention (Query sequence) was 
associated with a disease in the Morbid Map database, an OMIM reference identification 
number was noted in column 9, Table 1, labeled "OMIM Disease Reference(s)". Table 5 is 
a key to the OMIM reference identification numbers (column 1), and provides a description 
of the associated disease in Column 2. 
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[0048] Table 2 further characterizes certain encoded polypeptides of the 

invention, by providing the results of comparisons to protein and protein family databases. 
The first column provides a unique clone identifier, "Clone ID NO:", corresponding to a 
cDNA clone disclosed in Table 1. The second column provides the unique contig 
indentifier, "Contig ID:" which allows correlation with the information in Table 1. The 
third column provides the sequence identifier, "SEQ ID NO:X", for the contig 
polynucleotide sequences. The fourth column provides the analysis method by which the 
homology/identity disclosed in the row was determined. The fifth column provides a 
description of PFam/NR hits having significant matches identified by each analysis. 
Column six provides the accession number of the PFam/NR hit disclosed in the fifth 
column. Column seven, "Score/Percent Identity", provides a quality score or the percent 
identity, of the hit disclosed in column five. Comparisons were made between 
polypeptides encoded by polynucleotides of the invention and a non-redundant protein 
database (herein referred to as "NR"), or a database of protein families (herein referred to 
as 'TFam"), as described below. 

[00491 The NR database, which comprises the NBRF PIR database, the NCBI 

GenPept database, and the SIB SwissProt and TrEMBL databases, was made non- 
redundant using the computer program nrdb2 (Warren Gish, Washington University in 
Saint Louis). Each of the polynucleotides shown in Table 1, column 3 (e.g., SEQ ID 
NO:X or the 'Query' sequence) was used to search against the NR database. The computer 
program BLASTX was used to compare a 6~frame translation of the Query sequence to 
the NR database (for information about the BLASTX algorithm please see Altshul et al., J. 
Mol. Biol. 215:403-410 (1990), and Gish et al., Nat. Genet. 3:266-272 (1993)). A 
description of the sequence that is most similar to the Query sequence (the highest scoring 
'Subject') is shown in column five of Table 2 and the database accession number for that 
sequence is provided in column six. The highest scoring 'Subject' is reported in Table 2 if 
(a) the estimated probability that the match occurred by chance alone is less than 1.0e-07, 
and (b) the match was not to a known repetitive element. BLASTX returns alignments of 
short polypeptide segments of the Query and Subject sequences which share a high degree 
of similarity; these segments are known as High-Scoring Segment Pairs or HSPs. Table 2 
reports the degree of similarityJjetween the Query and the Subject for each HSP as a 
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percent identity in Column 7. The percent identity is determined by dividing the number 
of exact matches between the two aligned sequences in the HSP, dividing by the number 
of Query amino acids in the HSP and multiplying by 100. The polynucleotides of SEQ ID 
NO:X which encode the polypeptide sequence that generates an HSP are delineated by 
columns 8 and 9 of Table 2. ' ; 

[0050] The PFam database, PFam version 5.2, (Sonnhammer et al , Nucl. Acids 

Res., 26:320-322, (1998)) consists of a series of multiple sequence alignments; one 
alignment for each protein family. Each multiple sequence alignment is converted into a 
probability model called a Hidden Markov Model, or HMM, that represents the position- 
specific variation among the sequences that make up the multiple sequence alignment 
(see, e.g., R. Durbin et al., Biological sequence analysis: probabilistic) models of proteins 
and nucleic acids, Cambridge University Press, 1998 for the theory of HMMs). The 
program HMMER version 1.8 (Sean Eddy, Washington University in Saint Louis) was 
used to compare the predicted protein sequence for each Query sequence (SEQ ID NO:Y 
in Table 1) to each of the HMMs derived from PFam version 5.2. A HMM. derived from 
PFam version 5.2 was said to be a significant match to a polypeptide of the invention if 
the score returned by HMMER 1.8 was greater than 0.8 times the HMMER 1.8 score 
obtained with the most distantly related known member of that protein family. The 
description of the PFam family which shares a significant match with a polypeptide of the 
invention is listed in column 5 of Table 2, and the database accession number of the PFam 
hit is provided in column 6. Column 7 provides the score returned by HMMER version 
1.8 for the alignment. Columns 8 and 9 delineate the polynucleotides of SEQ ID NO:X 
which encode the polypeptide sequence which shows a significant match to a PFam 
protein family. 

[0051] As mentioned, columns 8 and 9 in Table 2, "NT From" and "NT To", 

delineate the polynucleotides of "SEQ ID NO:X" that encode a polypeptide having a 
significant match to the PFam/NR database as disclosed in the fifth column of Table 2. In 
one embodiment, the invention provides a protein comprising, or alternatively consisting 
of, a polypeptide encoded by the polynucleotides of SEQ ID NO:X delineated in columns 
8 and 9 of Table 2. Also provided are polynucleotides encoding such proteins, and the 
complementary strand thereto. 
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[0052] The nucleotide sequence SEQ ID NO:X and the translated SEQ ID NO:Y 

are sufficiently accurate and otherwise suitable for a variety of uses well known in the art 
and described further below. For instance, the nucleotide sequences of SEQ ID NO:X are 
useful for designing nucleic acid hybridization probes that, will detect nucleic acid 
sequences contained in SEQ ID NO:X or the cDNA contained in Clone ID NO:Z. These 
probes will also hybridize to nucleic acid molecules" in biological samples, thereby 
enabling immediate applications in chromosome mapping, linkage analysis, tissue 
identification and/or typing, and a variety of forensic and diagnostic methods of the 
invention. Similarly, polypeptides identified from SEQ ID NO:Y may be used to generate 
antibodies which bind specifically to these polypeptides, or fragments thereof, and/or to 
the polypeptides encoded by the cDNA clones identified in, for example, Table 1. 
[0053] Nevertheless, DNA sequences generated by sequencing reactions can 

— contain'sequencing errorsr^The'errors exist as misidentified nucleotides,~or as insertions or 
deletions of nucleotides in the generated DNA sequence. The erroneously inserted or 
deleted nucleotides cause frame shifts in the reading frames of the predicted amino acid 

—sequence. .In these cases,~theprechcted~ainino -acid -sequence diverges from the actual 
amino acid sequence, even though the generated DNA sequence may be greater than 
99.9% identical to the actual DNA sequence (for example, one base insertion or deletion 
in an open reading frame of over 1 000 bases). t 

[0054] Accordingly, for those applications requiring precision in the nucleotide 

" sequence or the amino acid sequence, the present invention provides not only the 
generated nucleotide sequence identified as SEQ ID NO:X, and a predicted translated 
amino acid sequence identified as SEQ ID NO:Y, but also a sample of plasmid DNA 
containing cDNA Clone ID NO:Z (deposited with the ATCC on June 5, 2000 and were 
given ATCC Deposit Nos. PTA-1982 and PTA-1985; and/or as set forth, for example, in 
Table 1, 6 and 7). The nucleotide sequence of each deposited clone can readily be 
~"defi^med~by se(pracinjg~the "deposited clone in"^ accordance with known methods. 
Further, techniques known in the art can be used to verify the nucleotide sequences of 
SEQ ID NO:X.niques known in the art can be used to verify the nucleotide sequences of 
SEQlDNO:X. 
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